在自主驾驶中,在使用深神经网络的爆炸中爆炸用于感知,预测和规划任务。由于自主车辆(AVS)更接近生产,多模态传感器输入和具有不同传感器平台的异构车队在该行业中变得越来越普遍。然而,神经网络架构通常是针对特定的传感器平台,并且对输入的变化并不稳健,使得缩放和模型部署的问题特别困难。此外,大多数玩家仍然将软件和硬件的问题视为完全独立的问题。我们提出了一个新的终端架构,广义传感器融合(GSF),其设计成使得传感器输入和目标任务都是模块化和可修改的。这使AV系统设计人员能够轻松地使用不同的传感器配置和方法进行实验,并使用在大型工程组织中共享的相同型号开辟了在异构船队上部署的能力。使用该系统,我们报告了实验结果,我们展示了昂贵的高密度(HD)激光雷达传感器的近似奇偶阶段,具有3D对象检测任务中的廉价低密度(LD)LIDAR加相机设置。这为行业铺平了道路,共同设计硬件和软件架构以及具有异质配置的大船队。
translated by 谷歌翻译
With growing sophistication and volume of cyber attacks combined with complex network structures, it is becoming extremely difficult for security analysts to corroborate evidences to identify multistage campaigns on their network. This work develops HeAT (Heated Alert Triage): given a critical indicator of compromise (IoC), e.g., a severe IDS alert, HeAT produces a HeATed Attack Campaign (HAC) depicting the multistage activities that led up to the critical event. We define the concept of "Alert Episode Heat" to represent the analysts opinion of how much an event contributes to the attack campaign of the critical IoC given their knowledge of the network and security expertise. Leveraging a network-agnostic feature set, HeAT learns the essence of analyst's assessment of "HeAT" for a small set of IoC's, and applies the learned model to extract insightful attack campaigns for IoC's not seen before, even across networks by transferring what have been learned. We demonstrate the capabilities of HeAT with data collected in Collegiate Penetration Testing Competition (CPTC) and through collaboration with a real-world SOC. We developed HeAT-Gain metrics to demonstrate how analysts may assess and benefit from the extracted attack campaigns in comparison to common practices where IP addresses are used to corroborate evidences. Our results demonstrates the practical uses of HeAT by finding campaigns that span across diverse attack stages, remove a significant volume of irrelevant alerts, and achieve coherency to the analyst's original assessments.
translated by 谷歌翻译
To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.
translated by 谷歌翻译
A self-supervised adaptive low-light video enhancement (SALVE) method is proposed in this work. SALVE first conducts an effective Retinex-based low-light image enhancement on a few key frames of an input low-light video. Next, it learns mappings from the low- to enhanced-light frames via Ridge regression. Finally, it uses these mappings to enhance the remaining frames in the input video. SALVE is a hybrid method that combines components from a traditional Retinex-based image enhancement method and a learning-based method. The former component leads to a robust solution which is easily adaptive to new real-world environments. The latter component offers a fast, computationally inexpensive and temporally consistent solution. We conduct extensive experiments to show the superior performance of SALVE. Our user study shows that 87% of participants prefer SALVE over prior work.
translated by 谷歌翻译
We propose a novel task, G4C (Goal-driven Guidance Generation in Grounded Communication), for studying goal-driven and grounded natural language interactions. Specifically, we choose Dungeons and Dragons (D&D) -- a role-playing game consisting of multiple player characters and a Dungeon Master (DM) who collaborate to achieve a set of goals that are beneficial to the players -- as a testbed for this task. Here, each of the player characters is a student, with their own personas and abilities, and the DM is the teacher, an arbitrator of the rules of the world and responsible for assisting and guiding the students towards a global goal. We propose a theory-of-mind-inspired methodology for training such a DM with reinforcement learning (RL), where a DM: (1) learns to predict how the players will react to its utterances using a dataset of D&D dialogue transcripts; and (2) uses this prediction as a reward function providing feedback on how effective these utterances are at guiding the players towards a goal. Human and automated evaluations show that a DM trained with RL to generate guidance by incorporating a theory-of-mind of the players significantly improves the players' ability to achieve goals grounded in their shared world.
translated by 谷歌翻译
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.
translated by 谷歌翻译
Understanding the ambient scene is imperative for several applications such as autonomous driving and navigation. While obtaining real-world image data with per-pixel labels is challenging, existing accurate synthetic image datasets primarily focus on indoor spaces with fixed lighting and scene participants, thereby severely limiting their application to outdoor scenarios. In this work we introduce OmniHorizon, a synthetic dataset with 24,335 omnidirectional views comprising of a broad range of indoor and outdoor spaces consisting of buildings, streets, and diverse vegetation. Our dataset also accounts for dynamic scene components including lighting, different times of a day settings, pedestrians, and vehicles. Furthermore, we also demonstrate a learned synthetic-to-real cross-domain inference method for in-the-wild 3D scene depth and normal estimation method using our dataset. To this end, we propose UBotNet, an architecture based on a UNet and a Bottleneck Transformer, to estimate scene-consistent normals. We show that UBotNet achieves significantly improved depth accuracy (4.6%) and normal estimation (5.75%) compared to several existing networks such as U-Net with skip-connections. Finally, we demonstrate in-the-wild depth and normal estimation on real-world images with UBotNet trained purely on our OmniHorizon dataset, showing the promise of proposed dataset and network for scene understanding.
translated by 谷歌翻译
Medical treatments tailored to a patient's baseline characteristics hold the potential of improving patient outcomes while reducing negative side effects. Learning individualized treatment rules (ITRs) often requires aggregation of multiple datasets(sites); however, current ITR methodology does not take between-site heterogeneity into account, which can hurt model generalizability when deploying back to each site. To address this problem, we develop a method for individual-level meta-analysis of ITRs, which jointly learns site-specific ITRs while borrowing information about feature sign-coherency via a scientifically-motivated directionality principle. We also develop an adaptive procedure for model tuning, using information criteria tailored to the ITR learning problem. We study the proposed methods through numerical experiments to understand their performance under different levels of between-site heterogeneity and apply the methodology to estimate ITRs in a large multi-center database of electronic health records. This work extends several popular methodologies for estimating ITRs (A-learning, weighted learning) to the multiple-sites setting.
translated by 谷歌翻译
Segmentation of regions of interest (ROIs) for identifying abnormalities is a leading problem in medical imaging. Using Machine Learning (ML) for this problem generally requires manually annotated ground-truth segmentations, demanding extensive time and resources from radiologists. This work presents a novel weakly supervised approach that utilizes binary image-level labels, which are much simpler to acquire, to effectively segment anomalies in medical Magnetic Resonance (MR) images without ground truth annotations. We train a binary classifier using these labels and use it to derive seeds indicating regions likely and unlikely to contain tumors. These seeds are used to train a generative adversarial network (GAN) that converts cancerous images to healthy variants, which are then used in conjunction with the seeds to train a ML model that generates effective segmentations. This method produces segmentations that achieve Dice coefficients of 0.7903, 0.7868, and 0.7712 on the MICCAI Brain Tumor Segmentation (BraTS) 2020 dataset for the training, validation, and test cohorts respectively. We also propose a weakly supervised means of filtering the segmentations, removing a small subset of poorer segmentations to acquire a large subset of high quality segmentations. The proposed filtering further improves the Dice coefficients to up to 0.8374, 0.8232, and 0.8136 for training, validation, and test, respectively.
translated by 谷歌翻译
In recent years, Machine learning (ML) techniques developed for Natural Language Processing (NLP) have permeated into developing better computer vision algorithms. In this work, we use such NLP-inspired techniques to improve the accuracy, robustness and generalizability of ML models for simulating transient dynamics. We introduce teacher forcing and curriculum learning based training mechanics to model vortical flows and show an enhancement in accuracy for ML models, such as FNO and UNet by more than 50%.
translated by 谷歌翻译